Julien Danjou: Extending Swift with middleware: example with ClamAV
In a pipeline, a black cat can become a white cat with the help of
some middleware.
To do our content analysis, the best place to hook in the Swift architecture
is at the beginning of every request, on swift-proxy, before the file is
actually stored on the cluster. Swift proxy uses, like many other OpenStack
projects, paste to build his HTTP architecture.
Paste uses WSGI and provides an architecture based on a pipeline. The
pipeline is composed of a succession of middleware, ending with one
application. Each middleware has the chance to look at the request or at the
response, can modify it, and then pass it to the following middleware. The
latest component of the pipeline is the real application, and in this case,
the Swift proxy server.
If you've already deployed Swift, you encountered a default pipeline in the
swift-proxy.conf configuration file:
[pipeline:main]
pipeline = catch_errors healthcheck cache ratelimit tempauth proxy-logging proxy-server
This is a really basic pipeline with a few middleware. The first one catches error, the second one is in charge to return 200 OK response if you send a GET /healthcheck request on your proxy server. The third one is in charge of caching, the fourth one is used for rate limiting, the fifth for authentication, the sixth one for logging, and the final one is the actual proxy server, in charge of proxying the request to the account, container, or object servers (the others components of Swift). Of course, we could remove or add any of the middleware here at our convenience. Be aware that the order matters: for example, if you put healthcheck after tempauth, you won't be able to access the /healthcheck URL without being authenticated! ClamAV
>>> import pyclamd
>>> pyclamd.init_unix_socket('/var/run/clamav/clamd.ctl')
>>> print pyclamd.scan_stream(pyclamd.EICAR)
'stream': 'Eicar-Test-Signature(44d88612fea8a8f36de82e1278abb02f:68)'
>>> print pyclamd.scan_stream("safe!")
None
Anatomy of a WSGI middleware Your WSGI middleware should consist of a callable object. Usually this is done with a class implementing the __call__ method. Here's a basic boilerplate:
class SwiftClamavMiddleware(object):
"""Middleware doing virus scan for Swift."""
def __init__(self, app, conf):
# app is the final application
self.app = app
def __call__(self, env, start_response):
return self.app(env, start_response)
def filter_factory(global_conf, **local_conf):
conf = global_conf.copy()
conf.update(local_conf)
def clamav_filter(app):
return SwiftClamavMiddleware(app, conf)
return clamav_filter
I'm not going to expand more on why this is built this way, but if you want to have more info on this kind of filter middleware, you can read their documentation on Paste. This middleware will just do nothing as it is. It's going to simply pass all requests it receives to the final application, and returns the result. Testing our basic middleware
import unittest
from webob import Request
class FakeApp(object):
def __call__(self, env, start_response):
return Response(body="FAKE APP")(env, start_response)
class TestSwiftClamavMiddleware(unittest.TestCase):
def setUp(self):
self.app = SwiftClamavMiddleware(FakeApp(), )
def test_simple_request(self):
resp = Request.blank('/',
environ=
'REQUEST_METHOD': 'GET',
).get_response(self.app)
self.assertEqual(resp.body, "FAKE APP")
We create a FakeApp class, that represents a fake WSGI application. You could also use a real application, or write a fake application looking like the one you want to test. It'll require more time, but your tests will be closer to the reality. Here we write the simplest test we can for our middleware. We're just sending a GET / request to it, so it passes the request to the final application and returns the result. It is transparent, it does nothing. Now, with that solid base we'll able to add more features and test these features incrementally. Plugging ClamAV in With our base ready, we can start thinking about how to plug ClamAV in. What we want to check here, is the content of the file when it's uploaded. If we refer to the OpenStack object storage API , a file upload is done via a PUT request, so we're going to limit the check to that kind of requests. Obviously, more checks could be added, but we'll keep things simple here for the sake of comprehensibility. With WSGI, the content of the request is available in env['wsgi.input'] as an object implementing a file interface. We'll scan that stream with ClamAV to check for viruses.
import pyclamd
from webob import Response
class SwiftClamavMiddleware(object):
"""Middleware doing virus scan for Swift."""
def __init__(self, app, conf):
pyclamd.init_unix_socket('/var/run/clamav/clamd.ctl')
# app is the final application
self.app = app
def __call__(self, env, start_response):
if env['REQUEST_METHOD'] == "PUT":
# We have to read the whole content in memory because pyclamd
# forces us to, but this is a bad idea if the file is huge.
scan = pyclamd.scan_stream(env['wsgi.input'].read())
if scan:
return Response(status=403,
body="Virus %s detected" % scan['stream'],
content_type="text/plain")(env, start_response)
return self.app(env, start_response)
def filter_factory(global_conf, **local_conf):
conf = global_conf.copy()
conf.update(local_conf)
def clamav_filter(app):
return SwiftClamavMiddleware(app, conf)
return clamav_filter
That's it. We only check for PUT requests and if there's a virus in the file, we return a 403 Forbidden error with the name of the detected virus, bypassing entirely the rest of the middleware chain and the application handling. Then, we can simply test it.
import unittest
from cStringIO import StringIO
from webob import Request, Response
class FakeApp(object):
def __call__(self, env, start_response):
return Response(body="FAKE APP")(env, start_response)
class TestSwiftClamavMiddleware(unittest.TestCase):
def setUp(self):
self.app = SwiftClamavMiddleware(FakeApp(), )
def test_put_empty(self):
resp = Request.blank('/v1/account/container/object',
environ=
'REQUEST_METHOD': 'PUT',
).get_response(self.app)
self.assertEqual(resp.body, "FAKE APP")
def test_put_no_virus(self):
resp = Request.blank('/v1/account/container/object',
environ=
'REQUEST_METHOD': 'PUT',
'wsgi.input': StringIO('foobar')
).get_response(self.app)
self.assertEqual(resp.body, "FAKE APP")
def test_put_virus(self):
resp = Request.blank('/v1/account/container/object',
environ=
'REQUEST_METHOD': 'PUT',
'wsgi.input': StringIO(pyclamd.EICAR)
).get_response(self.app)
self.assertEqual(resp.status_code, 403)
The first test test_put_empty simulates an empty PUT request. The second one, test_put_no_virus simulates a regular PUT request but with a simple file containing no virus. Finally, the third and last test simulates the upload of a virus using the EICAR test file. This is a special test file that is recognized as a virus, even if it's not real one. Very handy for testing virus detection software! Configuring Swift proxy Our middleware is ready! We can configure Swift's proxy server to use it. We need to add the following lines to our swift-proxy.conf to teach it how to load the filter:
[filter:clamav]
paste.filter_factory = swiftclamav:clamav_filter
We'll assume that our Python modules is named swiftclamava here. Now that we've defined our filter and how to load it, we can use it in our pipeline:
[pipeline:main]
pipeline = catch_errors healthcheck cache ratelimit tempauth clamav proxy-logging proxy-server
Just before reaching the proxy-server, and after the user being authenticated, the content will be scanned for viruses. It's important here to put this after authentication for example, because otherwise we may scan content that will get rejected by the authtemp module, thus scanning for nothing! Beyond scanning And voil , we now have a simple middleware testing uploaded content and refusing infected files. We could enhance it with various other things, like configuration handling, but I'll let that as an exercise for the interested readers. We didn't exploited it here, but note that you can also manipulate request headers and modify them if needed. For example, we could have added a header X-Object-Meta-Scanned-By: ClamAV to indicates that the file has been scanned by ClamAV. You should now be able to build your own middleware doing whatever you want with uploaded data. Happy hacking!